Lower PAC bound on Upper Confidence Bound-based Q-learning with examples

نویسندگان

Jia-Shen Boon

Xiaomin Zhang

چکیده

Abstract Recently, there has been significant progress in understanding reinforcement learning in Markov decision processes (MDP). We focus on improving Q-learning and analyze its sample complexity. We investigate the performance of tabular Q-learning, Approximate Q-learning and UCB-based Q-learning. We also derive a lower PAC bound Ω( |S| |A| 2 ln |A| δ ) of UCB-based Q-learning. Two tasks, CartPole and Pac-Man, are each solved using these three methods. Some results and discussion are presented at last. UCB-based learning does better in exploration but lose its advantage in exploitation, compared to its alternatives.Recently, there has been significant progress in understanding reinforcement learning in Markov decision processes (MDP). We focus on improving Q-learning and analyze its sample complexity. We investigate the performance of tabular Q-learning, Approximate Q-learning and UCB-based Q-learning. We also derive a lower PAC bound Ω( |S| |A| 2 ln |A| δ ) of UCB-based Q-learning. Two tasks, CartPole and Pac-Man, are each solved using these three methods. Some results and discussion are presented at last. UCB-based learning does better in exploration but lose its advantage in exploitation, compared to its alternatives.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Lower Bound for Completion Time Distribution Function of Stochastic PERT Networks

In this paper, a new method for developing a lower bound on exact completion time distribution function of stochastic PERT networks is provided that is based on simplifying the structure of this type of network. The designed mechanism simplifies network structure by arc duplication so that network distribution function can be calculated only with convolution and multiplication. The selection of...

متن کامل

A New Lower Bound for Completion Time Distribution Function of Stochastic PERT Networks

متن کامل

On the Sample Complexity of Noise-Tolerant Learning

In this paper, we further characterize the complexity of noise-tolerant learning in the PAC model. Specifically, we show a general lower bound of Ω ( log(1/δ) ε(1−2η) ) on the number of examples required for PAC learning in the presence of classification noise. Combined with a result of Simon, we effectively show that the sample complexity of PAC learning in the presence of classification noise...

متن کامل

PAC Bounds for Discounted MDPs

We study upper and lower bounds on the sample-complexity of learning nearoptimal behaviour in finite-state discounted Markov Decision Processes (mdps). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (ucrl) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends line...

متن کامل

Near-optimal PAC bounds for discounted MDPs

We study upper and lower bounds on the sample-complexity of learning near-optimal behaviour in finite-state discounted Markov Decision Processes (MDPs). We prove a new bound for a modified version of Upper Confidence Reinforcement Learning (UCRL) with only cubic dependence on the horizon. The bound is unimprovable in all parameters except the size of the state/action space, where it depends lin...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2016

Lower PAC bound on Upper Confidence Bound-based Q-learning with examples

نویسندگان

چکیده

منابع مشابه

A New Lower Bound for Completion Time Distribution Function of Stochastic PERT Networks

A New Lower Bound for Completion Time Distribution Function of Stochastic PERT Networks

On the Sample Complexity of Noise-Tolerant Learning

PAC Bounds for Discounted MDPs

Near-optimal PAC bounds for discounted MDPs

عنوان ژورنال:

اشتراک گذاری